AITopics | cm 2

Collaborating Authors

cm 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

656f0dbf9392657eed7feefc486781fb-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 02:01:41 GMT

consistency, quantity, train null test, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cross-Modal Alignment via Variational Copula Modelling

Wu, Feng, Chan, Tsai Hor, Wang, Fuying, Yin, Guosheng, Yu, Lequan

arXiv.org Machine LearningNov-6-2025

Various data modalities are common in real-world applications (e.g., electronic health records, medical images and clinical notes in healthcare). It is essential to develop multimodal learning methods to aggregate various information from multiple modalities. The main challenge is how to appropriately align and fuse the representations of different modalities into a joint distribution. Existing methods mainly rely on concatenation or the Kronecker product, oversimplifying the interaction structure between modalities and indicating a need to model more complex interactions. Additionally, the joint distribution of latent representations with higher-order interactions is underexplored. Copula is a powerful statistical structure for modelling the interactions among variables, as it naturally bridges the joint distribution and marginal distributions of multiple variables. We propose a novel copula-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities to capture the complex interactions among them. The key idea is to interpret the copula model as a tool to align the marginal distributions of the modalities efficiently. By assuming a Gaussian mixture distribution for each modality and a copula model on the joint distribution, our model can generate accurate representations for missing modalities. Extensive experiments on public MIMIC datasets demonstrate the superior performance of our model over other competitors. The code is available at https://github.com/HKU-MedAI/CMCM.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2511.03196

Country:

Asia > China > Hong Kong (0.04)
North America > United States > California > Orange County > Irvine (0.04)
North America > Canada (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media

Orsula, Andrej, Geist, Matthieu, Olivares-Mendez, Miguel, Martinez, Carol

arXiv.org Artificial IntelligenceOct-22-2025

Abstract-- Reliable autonomous navigation across the unstructured terrains of distant planetary surfaces is a critical enabler for future space exploration. However, the deployment of learning-based controllers is hindered by the inherent sim-to-real gap, particularly for the complex dynamics of wheel interactions with granular media. This work presents a complete sim-to-real framework for developing and validating robust control policies for dynamic waypoint tracking on such challenging surfaces. We leverage massively parallel simulation to train reinforcement learning agents across a vast distribution of procedurally generated environments with randomized physics. These policies are then transferred zero-shot to a physical wheeled rover operating in a lunar-analogue facility. Our experiments systematically compare multiple reinforcement learning algorithms and action smoothing filters to identify the most effective combinations for real-world deployment. Crucially, we provide strong empirical evidence that agents trained with procedural diversity achieve superior zero-shot performance compared to those trained on static scenarios. We also analyze the trade-offs of fine-tuning with high-fidelity particle physics, which offers minor gains in low-speed precision at a significant computational cost. T ogether, these contributions establish a validated workflow for creating reliable learning-based navigation systems, marking a substantial step towards deploying autonomous robots in the final frontier .

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2508.11503

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Comparisons to Neural Data A.1 Experimental Data

Neural Information Processing SystemsAug-14-2025, 21:33:52 GMT

"true" quantity we want to estimate

consistency, quantity, train null test, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Iterative Corpus Refinement for Materials Property Prediction Based on Scientific Texts

Zhang, Lei, Stricker, Markus

arXiv.org Artificial IntelligenceJun-11-2025

The discovery and optimization of materials for specific applications is hampered by the practically infinite number of possible elemental combinations and associated properties, also known as the `combinatorial explosion'. By nature of the problem, data are scarce and all possible data sources should be used. In addition to simulations and experimental results, the latent knowledge in scientific texts is not yet used to its full potential. We present an iterative framework that refines a given scientific corpus by strategic selection of the most diverse documents, training Word2Vec models, and monitoring the convergence of composition-property correlations in embedding space. Our approach is applied to predict high-performing materials for oxygen reduction (ORR), hydrogen evolution (HER), and oxygen evolution (OER) reactions for a large number of possible candidate compositions. Our method successfully predicts the highest performing compositions among a large pool of candidates, validated by experimental measurements of the electrocatalytic performance in the lab. This work demonstrates and validates the potential of iterative corpus refinement to accelerate materials discovery and optimization, offering a scalable and efficient tool for screening large compositional spaces where reliable data are scarce or non-existent.

composition, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.21646

Genre: Research Report (1.00)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning

Kawata, Ryotaro, Matsutani, Kohsei, Kinoshita, Yuri, Nishikawa, Naoki, Suzuki, Taiji

arXiv.org Machine LearningJun-3-2025

Mixture of Experts (MoE), an ensemble of specialized models equipped with a router that dynamically distributes each input to appropriate experts, has achieved successful results in the field of machine learning. However, theoretical understanding of this architecture is falling behind due to its inherent complexity. In this paper, we theoretically study the sample and runtime complexity of MoE following the stochastic gradient descent (SGD) when learning a regression task with an underlying cluster structure of single index models. On the one hand, we prove that a vanilla neural network fails in detecting such a latent organization as it can only process the problem as a whole. This is intrinsically related to the concept of information exponent which is low for each cluster, but increases when we consider the entire task. On the other hand, we show that a MoE succeeds in dividing this problem into easier subproblems by leveraging the ability of each expert to weakly recover the simpler function corresponding to an individual cluster. To the best of our knowledge, this work is among the first to explore the benefits of the MoE framework by examining its SGD dynamics in the context of nonlinear regression.

artificial intelligence, deep learning, machine learning, (12 more...)

arXiv.org Machine Learning

2506.01656

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > Jordan (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Stereo Hand-Object Reconstruction for Human-to-Robot Handover

Pang, Yik Lung, Xompero, Alessio, Oh, Changjae, Cavallaro, Andrea

arXiv.org Artificial IntelligenceDec-10-2024

Jointly estimating hand and object shape ensures the success of the robot grasp in human-to-robot handovers. However, relying on hand-crafted prior knowledge about the geometric structure of the object fails when generalising to unseen objects, and depth sensors fail to detect transparent objects such as drinking glasses. In this work, we propose a stereo-based method for hand-object reconstruction that combines single-view reconstructions probabilistically to form a coherent stereo reconstruction. We learn 3D shape priors from a large synthetic hand-object dataset to ensure that our method is generalisable, and use RGB inputs instead of depth as RGB can better capture transparent objects. We show that our method achieves a lower object Chamfer distance compared to existing RGB based hand-object reconstruction methods on single view and stereo settings. We process the reconstructed hand-object shape with a projection-based outlier removal step and use the output to guide a human-to-robot handover pipeline with wide-baseline stereo RGB cameras. Our hand-object reconstruction enables a robot to successfully receive a diverse range of household objects from the human.

artificial intelligence, handover, reconstruction, (17 more...)

arXiv.org Artificial Intelligence

2412.07487

Country: Europe (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition

Wang, Ruoyu, Cai, Chen, Wang, Wenqian, Gao, Jianjun, Lin, Dan, Liu, Wenyang, Yap, Kim-Hui

arXiv.org Artificial IntelligenceJun-18-2024

Driver action recognition has significantly advanced in enhancing driver-vehicle interactions and ensuring driving safety by integrating multiple modalities, such as infrared and depth. Nevertheless, compared to RGB modality only, it is always laborious and costly to collect extensive data for all types of non-RGB modalities in car cabin environments. Therefore, previous works have suggested independently learning each non-RGB modality by fine-tuning a model pre-trained on RGB videos, but these methods are less effective in extracting informative features when faced with newly-incoming modalities due to large domain gaps. In contrast, we propose a Continual Cross-Modal Mapping Network (CM2-Net) to continually learn each newly-incoming modality with instructive prompts from the previously-learned modalities. Specifically, we have developed Accumulative Cross-modal Mapping Prompting (ACMP), to map the discriminative and informative features learned from previous modalities into the feature space of newly-incoming modalities. Then, when faced with newly-incoming modalities, these mapped features are able to provide effective prompts for which features should be extracted and prioritized. These prompts are accumulating throughout the continual learning process, thereby boosting further recognition performances. Extensive experiments conducted on the Drive&Act dataset demonstrate the performance superiority of CM2-Net on both uni- and multi-modal driver action recognition.

encoder, modality, recognition, (16 more...)

arXiv.org Artificial Intelligence

2406.1134

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Suppressing Modulation Instability with Reinforcement Learning

Kalmykov, Nikolay, Zagidullin, Rishat, Rogov, Oleg, Rykovanov, Sergey, Dylov, Dmitry V.

arXiv.org Artificial IntelligenceApr-5-2024

Modulation instability is a phenomenon of spontaneous pattern formation in nonlinear media, oftentimes leading to an unpredictable behaviour and a degradation of a signal of interest. We propose an approach based on reinforcement learning to suppress the unstable modes by optimizing the parameters for the time modulation of the potential in the nonlinear system. We test our approach in 1D and 2D cases and propose a new class of physically-meaningful reward functions to guarantee tamed instability.

instability, modulation, modulation instability, (14 more...)

arXiv.org Artificial Intelligence

2404.0431

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Energetically Optimal Action Potentials

Neural Information Processing SystemsMar-15-2024, 09:03:25 GMT

Most action potentials in the nervous system take on the form of strong, rapid, and brief voltage deflections known as spikes, in stark contrast to other action potentials, such as in the heart, that are characterized by broad voltage plateaus. We derive the shape of the neuronal action potential from first principles, by postulating that action potential generation is strongly constrained by the brain's need to minimize energy expenditure. For a given height of an action potential, the least energy is consumed when the underlying currents obey the bang-bang principle: the currents giving rise to the spike should be intense, yet short-lived, yielding spikes with sharp onsets and offsets. Energy optimality predicts features in the biophysics that are not per se required for producing the characteristic neuronal action potential: sodium currents should be extraordinarily powerful and inactivate with voltage; both potassium and sodium currents should have kinetics that have a bell-shaped voltage-dependence; and the cooperative action of multiple'gates' should start the flow of current.

action potential, membrane, permeability, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback